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Abstract 



Sudderth, Wainwright, and Willsky have conjectured that the Bethe approximation corre- 
sponding to any fixed point of the behef propagation algorithm over an attractive, pairwise 
binary graphical model provides a lower bound on the true partition function. In this work, we 
resolve this conjecture in the affirmative by demonstrating that, for any graphical model with 
binary variables whose potential functions (not necessarily pairwise) are all log-supermodular, 
the Bethe partition function always lower bounds the true partition function. The proof of this 
^ I result follows from a new variant of the "four functions" theorem that may be of independent 

interest. 

(N; 

.^i 1 Introduction 

m 
o 



Graphical models have proven to be a useful tool for performing approximate inference in a wide va- 
■ riety of application areas including computer vision, combinatorial optimization, statistical physics. 



and wireless networking. Computing the partition function of a given graphical model, a typical 
^— ^ ■ inference problem, is an NP-hard problem in general. Because of this, the inference problem is often 

fSJ ■ replaced by a variational approximation that is, hopefully, easier to solve. The Bethe approxima- 

tion, one such standard approximation, is of great interest both because of its practical performance 
and because of its relationship to the belief propagation (BP) algorithm: stationary points of the 
. , Bethe free energy function correspond to fixed points of belief propagation [1]. However, the Bethe 

r> , partition function is only an approximation to the true partition function and need not provide an 

■ upper or lower bound. 

In certain special cases, the Bethe approximation is conjectured to provide a bound on the true 
partition function. One such example is the class of attractive pairwise graphical models: models in 
which the interaction between any two neighboring variables places a greater weight on assignments 
in which the two variables agree. Many applications in computer vision and statistical physics can 
be expressed as attractive pairwise graphical models (e.g., the ferromagnetic Ising model). Sudderth, 
Wainwright, and Willsky [5] used a loop series expansion of Chertkov and Chernyak [3J|3] in order 
to study the fixed points of BP over attractive graphical models. They provided conditions on the 
fixed points of BP under which the stationary points of the Bethe free energy function corresponding 
to these fixed points is a lower bound on the true partition function. Empirically, they observed 
that, even when their conditions were not satisfied, the Bethe partition function appeared to lower 
bound the true partition function, and they conjectured that this is always the case for attractive 
pairwise binary graphical models. 

Recent work on the relationship between the Bethe partition function and the graph covers of 
a given graphical model has suggested a new approach to resolving this conjecture. Vontobel [5] 
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demonstrated that the Bethe partition function can be precisely characterized by the average of the 
true partition functions corresponding to covers of the base graphical model. The primary contribu- 
tion of the present work is to show that, for graphical models with log-supermodular potentials, the 
partition function associated with any graph cover of the base graph, appropriately normalized, must 
lower bound the true partition function. As pairwise binary graphical models are log-supermodular 
if and only if they are attractive, combining our result with the observations of [5] resolves the 
conjecture of [2]. 

The key element in our proof, and the second contribution of this work, is a new variant of 
the "four functions" theorem that is specific to log-supermodular functions. We state and prove 
this variant in Section 13.11 and in Section 14.11 we use it to resolve the conjecture. As a final 
contribution, we demonstrate that our variant of the "four functions" theorem has applications 
beyond log-supermodular functions: we use it to show that the Bethe partition function can also 
provide a lower bound on the number of independent sets in a bipartite graph. 

2 Undirected Graphical Models 

Let / : {0, 1}" — !• K>o be a non-negative function. We say that / factors with respect to a hypergraph 
G — [V^A) where A C 2^, if there exist potential functions (jii : {0, 1} — >■ K>o for each i € V and 
ipa ■ {0, 1}I"I M>o for each a € A such that 

fix) = n 't^i(^i) n ^aiXa) 

where Xq, is the subvector of the vector x indexed by the set a. 

We will express the hypergraph G as a bipartite graph that consists of a variable node for each 
i € V, a factor node for each a € A, and an edge joining the factor node corresponding to a to the 
variable node representing i if i e a. This is typically referred to as the factor graph representation 
of G. 

Definition 2.1. A function f : {0, 1}" — > K>o is log-supermodular if for all x,y ^ {0, 1}" 

f{x)f{y)<f{xAy)f{xVy) 

where {x A y)i ~ mm{xi,yi} and (x V y)i — mayi{xi,yi} . Similarly, a function f : {0, 1}" — )■ K>o is 
log-submodular if for all x,y {0, 1}" 

f{x)f{y)>f{x/\y)f{xVy) 

Definition 2.2. A factorization of a function f : {0, 1}" — > K>o over G = (V, A) is log-supermodular 
if for all a G A, ipaixa) is log-supermodular. 

Every function that admits a log-supermodular factorization is necessarily log-supermodular as 
products of log-supermodular functions are easily seen to be log-supermodular, but the converse 
may not be true outside of special cases. If |q;| < 2 for each a € A, then we call the factorization 
pairwise. For any pairwise factorization, / is log-supermodular if and only if ipij is log-supermodular 
for each i and j. 

Pairwise graphical models such that ipaixa) is log-supermodular for all a S ^ are referred to 
as attractive graphical models. A generalization of attractive interactions to the non-pairwise case 
is presented in |2]: for all a G A, V'qj when appropriately normalized, has non- negative central 
moments. 

2.1 Graph Covers 

Graph covers have played an important role in our understanding graphical models [3 16] . 
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(a) A graph, G. 



(b) One possible cover of G. 



Figure 1: An example of a graph cover. The nodes in the cover are labeled for the node that they 
copy in the base graph. 



Definition 2.3. A graph H covers a graph G = if there exists a graph homomorphism 

h : H ^ G such that for all vertices v E G and all w G h^^{v), h maps the neighborhood dw of w 
in H bijectively to the neighborhood dv of v in G. If h{w) = v, then we say that w £ H is a copy of 
V £ G. Further, H is a k- cover of G if every vertex of G has exactly k copies in H . 

Roughly, if a graph H covers a graph G, then H looks locally the same as G. For an example of 
a graph cover, see Figure [TJ 

For the factor graph corresponding to G = {V,A), each k-cover consists of a variable node for 
each of the k\V\ variables, a factor node for each of the k\A\ factors, and an edge joining each copy 
of a G ^ to a distinct copy of each i £ a. To any k-cover H — {Vh, Ah) of G, we can associate a 
collection of potentials: the potential at node i £ Vh is equal to ^h(i) i the potential at node h{i) £ G, 
and for each a £ Ah, we associate the potential 4'h{a)- In this way, we can construct a function 
jH . ijkn ^ jj^^ gy^jj ^i-^g^^ jH factorizes over H. 

Notice that if f'^ admits a log-supermodular factorization over G and H is, a, fc-cover of G, then 
f^ admits a log-supermodular factorization over H . 



2.2 Bethe Approximations 

For a function / : {0,1}" — ^ M>o that factorizes over G — {V,A), we are interested computing 
the partition function Z{G) = X]a:/(^)- I'^ general, this is an NP-hard problem, but in practice, 
algorithms, such as belief propagation, based on variational approximations produce reasonable 
estimates in certain settings. One such variational approximation, the Bethe approximation at 
temperature T = 1, is defined as follows: 

logZB(G,T) = '^'^Ti{xi)\0g(j)i{xi) + ^ y^Ta{Xa)\og tpa{Xa) 
ieV Xi aeA Xa 

~'^'^Ti{x^)logTi{x,) - ^ra(a;a)log '^"'-'^"^ 
iev X, a^A x^ i-UeaM^^) 

for r in the local marginal polytope, 

T = {t > \ Ma £ A,i £ Ta{xa) = Ti[xi) and £ V, Ti{xi) = 1}. 

Xa\i Xi 

The fixed points of the belief propagation algorithm correspond to stationary points of log iG,T) 
over T, the set of pseudomarginals [T] , and the Bethe partition function is defined to be the maximum 
value achieved by this approximation over T: 

Zb{G) =maxZB(G,T). 

For a fixed factor graph G, we are interested in the relationship between the true partition 
function, Z{G), and the Bethe approximation corresponding to G, Zb(G). While, in general, Zb{G) 
can be either an upper or a lower bound on the true partition function, in this work, we address the 
following conjecture of |2j: 
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Conjecture 2.4. // / : {0, 1}" — )• R>o admits a pairwise, log-supermodular factorization over 
G = {V,A), then Zb{G) < Z{G). 

We resolve this conjecture in the afBrmative, and show that it continues to hold for a larger class 
of log-supermodular functions. Our results are based, primarily, on two observations: a variant of 
the "four functions" theorem [7] and the following, recent, theorem of Vontobel [5]: 

Theorem 2.5. 

Zb(G) ^ lim sup J V Z{H)/\C^{G)\ 




where C^{G) is the set of all k-covers of G. 

Proo/. See Theorem 27 of [5] . □ 

Theorem 12.51 suggests that a reasonable strategy for proving that Zb{G) < Z{G) would be to 
show that Z{H) < Z{G)^ for any fc-cover H of G. This is the strategy that we adopt in the 
remainder of this work. 



3 The "Four Functions" Theorem and Related Results 



Let be a function that computes the i*'' largest element of a collection. We will, abusively, denote 
this function as z^{x^ , . . . , x^) for any collection of vectors , . . . ^x^ G K". Here, . . . , x^) is 

the vector whose j*^ component is the i*'* largest element of a;j, . . . , ccj^ for each j G {1, . . . , n}. As 



an example, for vectors a;^, . . . , x £ {0, 1}", 



= {X]a=i — where {• > •} is one 



if the inequality is satisfied and zero otherwise. 

The "four functions" theorem [J is a general result concerning nonnegative functions over dis- 
tributive lattices. Many correlation inequalities, such as the FKG inequality, can be seen as special 
cases of this theorem [8]. 

Theorem 3.1 ("Four Functions" Theorem). Let fi, f2, fs, fA ■ {0,1}" — s> M>o be nonnegative real- 
valued functions. If for all x, y G {0, 1}", 



ther 



fi{x)f2{y)<f3ixAy)f4ixWy), 

z6{o,i}" z£{os}" ze{o,i}" ze{o,i}" 

The following lemma is a direct consequence of the four functions theorem: 

Lemma 3.2. // / : {0,1}" — > M>o is log-supermodular, then every marginal of f is also log- 
supermodular. 

The four functions theorem can be generalized to more than four functions, and a special case 
of the more general "2k functions" theorem is as follows O [ini HI] : 



Theorem 3.3 ("2k Functions" Theorem). Let /i, ... , fk : {0, 1}" R 

R>o be nonnegative real-valued functions. If for all x^ , . . . , x^ G {0, 1}*^ 



>o andgi, . . . ,5^ : {0, l}*" 



X{g.{x^)<X{f^{z\x\...,x>')), 



(1) 



i=l 



ther 



n[ E 5.(x)]<n[ E /^(-) 

1=1 2:e{o,i}" 1=1 xe{04}" 
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3.1 A Variant of the "Four Functions" Theorem 

A natural generalization of Theorem 13.31 would be to replace the product of functions on the left- 
hand side of Equation [T] with an arbitrary function over x^, . . . , x'^. While the conclusion of the 
theorem may not continue to hold for arbitrary choices of such a function, we will show that we can 
replace this product with an arbitrary log-supermodular function while preserving the conclusion 
of the theorem. The key property of log-supermodular functions that makes this possible is the 
following lemma: 

Lemma 3.4. If g : {0, 1}" — > M>o is log-supermodular, then for any integer k > 1 and , . . . ^x^ S 
{0, 1}", 

ilg{x^)<f{g{z\x\...,x^)). 

Proof. This follows directly from the log-supermodularity of g. □ 
The proof of our variant of the "2fc functions theorem" uses the properties of weak majorizations: 
Definition 3.5. A vector x e R" is weakly majorized by a vector y G M", denoted x y, if 

t t 

^ ^ Z (xi, ■ . . , Xji 

i=l i=l 

for t e {1, . . . , n}. 

For the purposes of this paper, we will only need the following result concerning weak majoriza- 
tions: 

Theorem 3.6. For x,y £ M", x -<w y if and only if 

n n 
i=l 1=1 

for all continuous increasing convex functions : M — >■ M. 

Proo/. See 3.C.l.b and 4.B. 2 of [12]. □ 

We now state and prove our variant of the 2k functions theorem in two pieces. First, we consider 
the case where n = 1: 

Lemma 3.7. Let fi, . . . , fk : {0,1} K>o and g : {0,1}''' K>o be nonnegative real-valued 
functions such that g is log-supermodular. If for all x^ , . . . ,x^ G {0, 1}, 

k 

g{x\...,x^)<\{f,{z\x\...,x'')), 

i=l 

then 

k 

^ g{x\...,x^)<X{[ J2 M^)_- 

x^,...,x'' i=l a;e{0,l} 

Proof. Let G S IR.^'' be the vector whose 2^ elements correspond to the 2^ distinct evaluations of 
g. Similarly, let F G be the vector whose 2^ elements correspond to the 2*^ distinct evaluations 
of /(xi,...,x^) ^ nti/»(a^')- Let logG ^ (logGi,...,logG2.) and logF ^ (logFi, . . . ,logF2.). 
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Our strategy will be to show that log G -<uj log F. Then, by Theorem 13.61 and the fact that 2^ is 
convex and increasing, we will have 

x^,...,x'' i=l i=l ,...,x'' i=l 

as desired. We note that, by continuity arguments, this analysis holds even when some values of g 
and / are equal to zero. Further, let G"^ £ M^'^) be the vector obtained from G by only considering 
assignments with exactly c nonzero elements (i.e., + . . . + x'^ — c), and define F'^ similarly for F. 
If we can show that 

M M 

n ^"(Gj, . . . , Gy < n , • • ■ , ^^(^)) 

m— 1 m— 1 

for all c e {0, . . . , k} and M < (^), then we must have that logG ~<w logF. 

Now, fix c e {0, . . . , fc}, T e {1, . . . , (^:) }, and let = {u G {0, 1}'= I ui + . . . + Wfc = c}. Suppose 
, . . . , v'^ d are T distinct vectors. By Lemma [S^U we must have 

T T T 

t=i t=i t=i 

where — . . . , v'^)i, . . . , z*{v^, . . . , w^)fc) for each j 6 {1, ... , fc}. Given any such v^,. . . ,v'^ G 

y^, we will show how to construct distinct vectors , . . . ,v'^ S V such that HtLi /(^*) — 
ritLi /(^*)- Consequently, we will have 

T r T 

t=l t=l m=l 

As our construction will work for any choice of distinct vectors , . . . jv"^ G V^, it will work, in 
particular, for the T distinct vectors in that maximize nt=i5(^*)i the lemma will then 
follow as a consequence of our previous arguments. 

We now describe how to construct the vectors v^, . . . ,v'^ from the vectors v^, . . . . Let A G 
j^fcxt ^j^g matrix whose i*'' column is given by the vector w'. Construct A G M*^^* from yl by 
swapping the rows of A so that for each i < j € {1, . . . ,k},J2p^ip > J2p^jp- Intuitively, the 
first row of A corresponds to the row of A with the most nonzero elements, the second row of A 
corresponds to the row of A with the second largest number of nonzero elements, and so on. Let 
, . . . .if" be the columns of A. Notice that TJ"'^,...,u^ are distinct vectors in and that, by 
construction, z^{z*{v^, . . . ,v'^)i, . . . , z*{v^, . . . ,v'^)k) — . . . ,v'^)j for each j G {1, . . . , fc} and 

t G {1, . . . , T}. Therefore, we must have 

n 9iv') < n 9iz'{v\ . . . ,v^)) < f[ f{z\v\ . . . = n /(^*) 
t=i t=i t=i t=i 

where the equality follows from the definition of / as a product of the fi. In addition, the vector 
. . . , v^) is simply a permuted version of the vector z*(w^, . . . , w^) which means that their j*'' 
largest elements must agree: 

w]^z={z\v\...y\,...,z\v\...y)k) 
= z\z\v\...,v^),,...,z\v\...,v'^)k) 
= z\v\...,v^),. 

Therefore, 

T T T T 

x{9{v') < n = n /(^*(^'' . . . , ^7^)) - n /(^*) 
t=i t=i t=i t=i 

and the lemma follows as a consequence . □ 
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In the case that n = I and A; > 1, this lemma is a more general result than the 2k functions 
theorem: if g{x^ , . . . , x'') = Yligiix^) for gi,...,gk ■ {0,1} — >■ M>o, then g is log-supermodular. As 
in the proof of the 2k functions theorem, the general theorem for n > 1 follows by induction on n: 

Theorem 3.8. Let fi, . . . , fk : {0, 1}" — > R>o and g : {0, 1}''" M>q be nonnegative real-valued 
functions such that g is log-supermodular. If for all , . . . , x'' G {0, 1}", 

k 

g{x\...,x'^)<l[fdz\x\...,x'')), 

i=l 

then 

^ g{x\...,x'^)<f[[ J2 

x^,...,x'' i=l a;e{0,l}" 

Proof. We will prove the result for general k and n by induction on n. The base case of 7i = 1 
follows from Lemma 15771 Now, for n > 2, suppose that the result holds for fc > 1 and n — 1, and let 
/i, . . . , /fc : {0, 1}" — )• M>o and g : {0, 1}'^" —!> K>o be nonnegative real-valued functions such that g 
is log-supermodular. 

Define /' : {0, ^ R>o and g' : {0, R>o as 

g'{y\...,y'')= ^ g{y\ s\ . . . ,y' , s') 

si,...,s'=G{0,l} 

Notice that g' is log-supermodular because it is the marginal of a log-supermodular function (see 
Lemma I3.2p . If we can show that 

k 

g'{y\...,y'')<llfKz'iy\...,y'')) 

i=l 

for all y^,...,y'' £ {0,1}"""'^, then the result will follow by induction on n. To show this, fix 
y^,... e {0, and define / : {0, 1} -J- R>o and g : {0, 1}'= ^ R>o as 

7,(s) = /.(z^(y\...,/),s) 
g{s\...,s'') = g{y\s\...,y\s'') 

We can easily check that g{s^, . . . , s'"') is log-supermodular and that g(s-^, . . . , s'^) < HiLi fii^^i^^i ■ ■ 
for all s^, . . . , s*^ G {0, 1}. Hence, by Lemma [3. 7[ 

k k 

g'{y\...,t)= E 9{s\.-.,s^)<X{ ^ 7 .{s) = \{fi{z\y\ ■ ■ ■ ,t)) 

si,...,^* i=lse{0,l} i=l 

which completes the proof of the theorem. □ 

4 Graph Covers and the Partition Function 

In this section, we show how to apply Theorem l3.8l in order to resolve Coniecture l2.4l In addition, we 
show that the theorem can be applied, more generally, to yield similar results for a class of functions 
that can be converted into a log-supermodular functions by a change of variables. 
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4.1 Log-supermodularity and Graph Covers 

The following theorem follows easily from Theorem 13.81 

Theorem 4.1. If f'^ : {0,1}" — > M>o admits a log- supermodular factorization over G = {V,A), 
then for any k-cover, H, ofG, Z{H)'< Z{Gf. 

Proof. Let H he & fc-cover of G. Divide the vertices of H into A; sets Si,. . . ,Sk such that each set 
contains exactly one copy of each vertex i € V. Let the assignments to the variables in the set Si 
be denoted by the vector x^. 

For each a € A, let denote the assignment to the i*'' copy of a by the elements oi , . . . ,x'' . 
By Lemma [3. 4[ 

i=l i=l i—1 i—1 

From this, we can conclude that f"{x\...,x'') < ULi i^' ^ ■ ■ ■ ^ 

)). Now, by Theorem 

k 

Z(H)= f"{x\...,x')<Y[[Y,f''{xn] = Z{Gr 

x^,...,x''^ i—1 x'^ 

□ 

This theorem settles the conjecture of pi for any log-supermodular function that admits a pair- 
wise binary factorization. Indeed, the above theorem solves the problem for a larger class of log- 
supermodular graphical models: 

Corollary 4.2. /// : {0, 1}" — > M>o admits a log-supermodular factorization over G — (V,A), then 
Zb{G) < Z(G). 

Proof. This follows directly from Theorem 14. II and Theorem 12.51 □ 

As the value of the Bethe approximation at any of the fixed points of BP is always a lower bound 
on Zb{G), the conclusion of the corollary holds for any fixed point of the BP algorithm as well. 

Corollary 4.3. /// : {0, 1}" — > M>o admits a log-supermodular factorization over G — (V,A), then 

Zn{G) = hm / J2 ZiH)/\CHG)\ 
y Hec>'(G) 

where C^{G) is the set of all k-covers of G. 

Proof By Theorem IH] and the definition of Zb, Z{H) > Zb{H) > Zb{G)'' for any k-cover H of G. 
The corollary then follows from Theorem l2.5l □ 

4.2 Beyond Log-supermodularity 

While Theorem 14. II is a statement only about log-supermodular functions, we can use Theorem 13.81 
to infer similar results even when the function under consideration is not log-supermodular. As an 
example of such an application, we consider the problem of counting the number of independent sets 
in a given graph, G = {V, E). An independent set, / C T/, in G is a subset of the vertices such that 
no two adjacent vertices are in /. We define the following function: 

I^{xi,...,X\v\)'^ W {l-XiXj) 
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which is equal to one if the nonzero Xi^s define an independent set and zero otherwise. As every 
potential function depends on at most two variables, factorizes over the graph G = {V, E). Notice 
that f'^ is log-submodular, not log-superniodular. 

In this section, we will focus on bipartite graphs: G = (V, E) is bipartite if we can partition the 
vertex set into two sets A C V and B = V \ A such that A and B are independent sets. Examples 
of bipartite graphs include single cycles, trees, and grid graphs. We will denote bipartite graphs as 
G^iA,B,E). 

For any bipartite graph G — {A,B,E), l'^ can be converted into a log-supermodular graphical 
model by a simple change of variables. Define ya — Xa for all a G A and yb — 1 — Xb for all b € B. 
We then have 

I^{xi,...,X\Y\) = Y\_ - XiXj) 

n il-yail-yb)) 

{a,b)eE,aeA,beB 

= I^{yi,...,y\v\)- 

I admits a log-supermodular factorization over G and (y) = ^x^^(^)- Similarly, for any 

graph cover H of G, we have ^yl^iy) — J2x-^^i^)- Consequently, by Theorem 13.81 we can 
conclude that Z{G) > Zb{G). 

Similar observations can, for example, be used to show that the Bethe partition function provides 
a lower bound on the true partition function for other problems that factor over pairwise bipartite 
graphical models (e.g., the antiferromagnetic Ising model on a grid, counting the number of vertex 
covers of a bipartite graph, counting the number of satisfying assignments of a monotone 2-SAT 
instance whose corresponding graphical structure is bipartite). 

5 Conclusions 

While the results presented above were discussed in the case that the temperature parameter, T, was 
equal to one, they easily extend to any T > (as exponentiation preserves log-supermodularity in 
this case). Hence, all of the bounds discussed above can be extended to the problem of maximizing a 
log-supermodular function. In particular, the inequality in Theorem l4.1l suggests that the maximizing 
assignment on any graph cover must correspond to a lift of a maximizing assignment on the base 
graph. 

This work also suggests a number of directions for future research. While the above work provides 
lower bounds on the partition function, similar ideas may be able to provide upper bounds as well. 
We note that related work on the Bethe approximation for permanents has already begun to explore 
these possibilities [TB] [2] . Similarly, an analog of Theorem 13.81 for log-submodular functions may 
also be useful in the pursuit of upper bounds. The primary difficulty is that marginal distributions 
of log-submodular functions are not necessarily log-submodular, but perhaps upper bounds can be 
obtained when restricting to families of log-submodular functions all of whose marginals are also 
log-submodular. 
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